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In this review we discuss transposon-insertion sequencing, 
variously known in the literature as TraDIS, Tn-seq, INSeq 
and HITS. By monitoring a large library of single transposon- 
insertion mutants with high-throughput sequencing, these 
methods can rapidly identify genomic regions that contribute 
to organismal fitness under any condition assayable in the 
laboratory with exquisite resolution. We discuss the various 
protocols that have been developed and methods for analysis. 
We provide an overview of studies that have examined the 
reproducibility and accuracy of these methods, as well as 
studies showing the advantages offered by the high resolution 
and dynamic range of high-throughput sequencing over 
previous methods. We review a number of applications in 
the literature, from predicting genes essential for in vitro 
growth to directly assaying requirements for survival under 
infective conditions in vivo. We also highlight recent progress 
in assaying non-coding regions of the genome in addition to 
known coding sequences, including the combining of RNA- 
seq with high-throughput transposon mutagenesis. 



Introduction 

A common approach to identifying genomic regions involved 
in survival under a particular set of conditions is to screen large 
pools of mutants simultaneously. This can be done with defined 
mutants; 1,2 however, the construction of denned mutant libraries 
is labor intensive and requires accurate genomic annotation, which 
can be particularly difficult to define for non-coding regions. An 
alternative to defined libraries is the construction and analysis 
of random transposon-insertion libraries. The original applica- 
tion of this method used DNA hybridization to track uniquely 
tagged transposon-insertions in Salmonella enterica serovar 
Typhimurium over the course of BALB/c mouse infection. 3 
DNA hybridization was eventually superseded by methods that 
used microarray detection of the genomic DNA flanking inser- 
tion sites, variously known as TraSH, MATT and DeADMAn 
(reviewed in ref. 4). However, these methods suffered from many 
of the problems microarrays generally suffer from: difficulty 
detecting low-abundance transcripts, mis-hybridization, probe 
saturation and difficulty identifying insertion sites precisely. 
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The application of high-throughput sequencing to the chal- 
lenge of determining insertion location and prevalence solves 
many of these problems. Interestingly, the first application of 
transposon-insertion sequencing, developed by Hutchison et al., 
actually predates the development of microarray-based methods. 5 
However, this was applied to libraries of only approximately 1,000 
transposon mutants in highly reduced Mycoplasma genomes, and 
the difficulty of sequencing at the time prevented wide spread 
adoption or high resolution. Modern high-throughput sequenc- 
ing technology allows the methods discussed in this review to 
routinely monitor as many as one million mutants simultane- 
ously in virtually any genetically tractable microorganism. 

Protocols. Several methods were developed concurrently 
for high- throughput sequencing of transposon-insertion sites: 
TraDIS, 6 INSeq, 7 HITS 8 and Tn-seq 9 followed by Tn-seq Circle 10 
and refinements to the INSeq protocol." All of these protocols 
follow the same basic workflow with minor variations (see Fig. 1, 
Table 1): transposon mutagenesis and construction of pools of 
single insertion mutants; enrichment of transposon-insertion 
junctions and, finally, in some protocols a purification step either 
precedes or follows PCR enrichment before sequencing. 

Transposon mutagenesis. Most studies have used either 
Tn5 or mariner transposon derivatives. Tn5 originated as a 
bacterial transposon which has been adapted for laboratory 
use. Large-scale studies have shown that Tn5, while not show- 
ing any strong preference for regional GC-content, do have a 
weak preference for a particular insertion motif. 12 " 14 Transposon- 
insertion sequencing studies performed with Tn5 transposons in 
S. enterica serovars have reported a slight bias toward AT-rich 
sequence regions. 615 However, this preference does not appear 
to be a major obstacle to analysis given the extremely high inser- 
tion densities obtained with this transposon 615,16 (see Table 1). 
Additionally, Tn5 has been shown to be active in a wide range of 
bacterial species, though the number of transformants obtained 
can vary significantly depending on the transformation effi- 
ciency of the host. 

Mariner I Himarl transposons on the other hand originate from 
eukaryotic hosts and have an absolute requirement for TA bases 
at their integration site, 17,18 with no other known bias besides a 
possible preference for bent DNA. 17 This can be a disadvantage in 
that it limits the number of potential insertion sites, particularly 
in GC-rich sequence. However, this specificity can also be used 
in the prediction of gene essentiality in near-saturated libraries: 
as every potential integration site is known and the probability of 



www.landesbioscience.com 



RNA Biology 



1161 



j~ln vivo (TraTjIS, TNSeq, Tn-seq Circle! | 
Jor in vitro (Tn-seq, HITS) transposition! 





Selection 

r I 

!^rcNv^clonal_mutants_(TNseq)j 



Pool Mutants 



Extract Genomic DNA 



Punficatron" - 
LLHLIS,_LNSe_q_,Jn 1 seg_Circlelj 



Sequencing and Mapping 



PCR Enrichment of Insertions 



Fragmentation 

I "ByTesfricTfonUTgest 1 
j (Tn-seq, Tn-seq Circle, INSeq) ] 

or physical shearing 
L ffraJDIS^HITSJ J 

Adapter Ligation 



Figure 1. An illustration of the workflow typical of transposon-insertion sequencing protocols. Transposons are represented by pink lines, sequenc- 
ing adaptors by blue, genomic DNA by black and PCR primers by green. Mutants are generated through either in vivo or in vitro transposition and 
subsequent selection for antibiotic resistance. These mutants are pooled, and optionally competed in test conditions, then genomic DNA is extracted 
and fragmented by restriction digest or physical shearing. Sequencing adaptors are ligated, some protocols then perform a step to purify fragments 
containing transposon insertions, and PCR with transposon- and adaptor-specific primers is used to specifically enrich for transposon-containing frag- 
ments. The fragments are then sequenced and mapped back to a reference genome to uniquely identify insertion sites with nucleotide-resolution. 
Dashed boxes indicate steps which differ between protocols. 



integration at any particular site can be assumed to be roughly 
equal, it is straight-forward to calculate the probability that any 
particular region lacks insertions by chance. Himarl transposi- 
tion can also be conducted in vitro in the absence of any host 
factors, 19 and inserted transposons can then be transferred to the 
genomes of naturally transformable bacteria through homolo- 
gous recombination. 20 This can be advantageous when working 
with naturally transformable bacteria with poor electroporation 
efficiency. 8,9 It is worth noting that Tn5 is also capable of transpo- 
sition in vitro, 21 and could potentially be used to increase inser- 
tion density and, hence, the resolution of the assay, particularly 
in GC-rich genomic regions. 

Pool construction. Once mutants have been constructed, 
they are plated on an appropriate selective media for the trans- 
poson chosen, and colonies are counted, picked and pooled. A 
disadvantage of this is that the mutants must be recreated for 
follow up or validation studies. Goodman et al. introduced a 
clever way around this in the INSeq protocol: by individually 
archiving mutants, then sequencing combinatorial mutant pools, 
it is possible to uniquely characterize 2" insertion mutants by 
sequencing only n pools. 7 Each mutant is labeled with a unique 
binary string that indicates which pools it has been added to. 
These binary strings can then be reconstructed for each insertion 
observed in these pools by recording their presence or absence in 
sequencing data, providing a unique pattern relating insertions 
to archived mutants. The authors control false identifications 
due to errors in sequencing by requiring that each binary label 
have a minimum edit distance to every other label, allowing for 
a robust association of labels with insertions despite sometimes 
noisy sequencing data. As a proof of concept, the authors were 
able to identify over 7,000 Bacteroides thetaiotaomicron mutants 
from only 24 sequenced pools. This effectively uses methods for 



the generation of random transposon pools to rapidly generate 
defined mutant arrays, though it is heavily dependent on liquid- 
handling robotics. 

Enrichment of transposon-insertion junctions. Once pools 
have been constructed, they are grown in either selective or 
permissive conditions, depending on the experiment, and then 
genomic DNA is extracted. Fragmentation proceeds either 
through restriction digestion in the case of transposons modi- 
fied to contain appropriate sites 7 ' 9,10 or via physical shearing, 6,8 
then sequencing adapters are ligated to the resulting fragments. 
PCR is performed on these fragments using a transposon-spe- 
cific primer and a sequencing adaptor-specific primer to enrich 
for fragments spanning the transposon-genomic DNA junction. 
Some protocols purify fragments containing transposon inser- 
tions using biotinylated primers 10,11 or PAGE 7 before and/or after 
PCR enrichment. The purification step from the Tn-seq Circle 
protocol is particularly unusual in that restriction-digested frag- 
ments containing transposon sequence are circularized before 
being treated with an exonuclease that digests all fragments with- 
out transposon insertions, theoretically completely eliminating 
background. 10 Given the success of protocols that do not include 
a purification step and the lack of systematic comparisons, it 
is currently unclear whether including one provides any major 
advantages. 

Reproducibility, Accuracy, and 
Concordance with Previous Methods 

A number of studies have looked at the reproducibility of trans- 
poson-insertion sequencing. Multiple studies using different 
protocol variations have repeatedly shown extremely high repro- 
ducibility in the number of insertions per gene (correlations of 



1162 



RNA Biology 



Volume 10 Issue 7 



•a 

<u 
c 



13 



it o 



E "o 
£ .E 



? E 

O fO 

in -o 

II 

S E 



o 



<u o 
ct> S 



o 
a 



a 
a 
< 



o 
o 
u 



o 
E 
o 
a 



c .E — 



2 8 « 



2 I 

-™ CTl 



o 



"CT 



•- oj 

ro E 
a | 

° S- 

l_l CT 
<1J 



a 

Q. 

£ £ 

o c 
u o 



c 
O 



01 

E 



a 
o 



= oo 



U-l U~t ~ 



o 
o 
o 



a. 



Q- 



J2 



a. 

-Q 

CO 



Q. 



Q. 

J2 



a 

Si 
O 



J2 



5 73 
-Q of .S! 

s: £ -q 

T- 2 S 



o 
o 
o 



o 
o 
o 



X 



X 



O 
O 

o 
o 
o 



o 
o 
o 



o 
o 
o 

oC 
! 



o 
o 
o 
o 



o 
o 
o 



o 
o 
o 
o" 
o 



■M 8 

— o 

E o 
m 



•z: > 
13 o 



c 

IB 
01 



o 
c 
5i 



c 
& 
a: 



M — C 

o ™ 



-Q 

E 



o 



a 
a 
< 



.a 



3 







O 




O 




rs 




Q. 




QJ 




m 




73 




c 




(D 








i/i 




O 




X 




ID 


0J 


u 


-D 




O 






<5 




c 




ra 




E 




73 




O 




O 









a. 
O 



O 
O 



01 <U 

cti cc 

73 



.= E 
a ° 

o £ 

c 13 



o 

CN 

CTl 
- 
< 



E 
o 
c 



■5 IS 

c cD 

— c 
"D a» 

U CTl 

3 <- 

to to 

^ 3 

^ E 

TO M— 

— a; 
E 

n 

"> N 

u - E 

iE oj 

r CTl 



www.landesbioscience.com 



RNA Biology 



1163 



-90%) in replicates of the same library grown and sequenced 
independently, 7,9,10 and good reproducibility (correlations 
between 70-90%) in independently constructed non-saturated 
libraries. 9,22 Van Opijnen and Camilli 22 compared traditional 
lxl competition experiments between wild-type and mutant 
Streptococcus pneumoniae to results obtained by transposon-inser- 
tion sequencing and showed that there was no significant differ- 
ence in results over a range of tested conditions. 

The accuracy of transposon-insertion sequencing in deter- 
mining library composition has also been assessed. Zhang et al. 
constructed a library of identified transposon-insertion mutants 
in known relative quantities, and then were able to recover the 
relative mutant prevalence with transposon-insertion sequenc- 
ing. 23 Additionally, by estimating the number of PCR templates 
prior to enrichment, this study showed that there is a high cor- 
relation between enrichment input and sequencing output. 

Two studies have evaluated concordance between results 
obtained with transposon-insertion sequencing and microarray 
monitoring of transposon insertions in order to demonstrate the 
enhanced accuracy and dynamic range of sequencing over previ- 
ous methods. In the first, 19 libraries of 95 enterohemorrhagic 
Escherichia coli (EHEC) transposon mutants that had previ- 
ously been screened in cattle using signature-tagged mutagenesis 
(STM), were pooled and re-evaluated using the TraDIS proto- 
col. 24 The original STM study had identified 13 insertions in 11 
genes attenuating intestinal colonization in a type III secretion 
system located in the locus of enterocyte effacement (LEE). 25 By 
applying sequencing to the same samples, an additional 41 muta- 
tions in the LEE were identified, spanning a total of 21 genes. 
Additional loci outside the LEE, which have been previously 
implicated in intestinal colonization but had not been detected 
by STM, were also reported by TraDIS. 

The second study re-evaluated genes required for optimal 
growth determined by TraSH in Mycobacterium tuberculosis^' 21 
The greater dynamic range of sequencing as compared with 
microarrays allowed easier discrimination between insertions 
that were nonviable and those that were only significantly under- 
represented. The authors estimate that genes called as required by 
sequencing in their study are at least 100-fold underrepresented in 
the pool. In comparison, the threshold in the previous microarray 
experiment reported genes that had log probe ratios at least 5 -fold 
lower than average between transposon-flanking DNA hybrid- 
ization and whole genomic DNA hybridization. Additionally, the 
nucleotide-resolution of insertion sequencing allowed the authors 
to identify genes which had required regions, likely correspond- 
ing to required protein domains, 23 but which tolerated insertions 
in other regions. Altogether, the authors increase the set of genes 
predicted to be required for growth in laboratory conditions in 
M. tuberculosis by more than 25% (from 614 to 774). 

Gene Requirements 

The earliest application of transposon-insertion sequencing was 
to determine the minimal set of genes necessary for the survival 
of Mycoplasma? This essential genome is of great interest to syn- 
thetic and systems biology where it is seen as a foundation for 



engineering cell metabolism, and in infection biology and medi- 
cine where it is seen as a promising target for therapies. However, 
it is important to remember that "essentiality" is always relative 
to growth conditions: a biosynthetic gene that is non-essential in 
a growth medium supplying a particular nutrient may become 
essential in a medium that lacks it. Traditionally, gene essentiality 
has been determined in clonal populations; 1,28,29 since the high- 
throughput transposon sequencing protocols described here nec- 
essarily contain a short period of competitive growth before DNA 
extraction, many of these studies prefer to refer to the "required" 
genome for the particular conditions under evaluation. 

Because of this short period of competitive growth, and 
because many otherwise required genes tolerate insertions in 
their terminus 7,27,30 or outside essential domains, 23 the determi- 
nation of required genomic regions is not completely straight- 
forward and a number of approaches have been taken to counter 
this. These include only calling genes completely lacking inser- 
tions as required, 9 determining a cut-off based on the empirical 
or theoretical distribution of gene-wise insertion densities. 6,15,27,30 
Additionally, windowed methods have been developed which 
can be used to identify essential regions in the absence of gene 
annotation, 23,31 and have had success in identifying required 
protein domains, promoter regions and non-coding RNAs 
(ncRNAs). The organisms that have been evaluated for gene 
requirements under standard laboratory conditions are summa- 
rized in Table 1. 

In agreement with previous studies, 1,28 many required genes 
identified by transposon-insertion sequencing are involved in 
fundamental biological processes such as cell division, DNA 
replication, transcription and translation, 6,7,15,27 and many of 
these requirements appear to be conserved between genera and 
classes. 15,16 However, a recent study defining required gene sets 
in Salmonella serovars has found that phage repressors, neces- 
sary for maintaining the lysogenic state of the prophage, are also 
required, 15 even though mobile genetic elements such as phage 
are usually considered part of the accessory genome. This study 
also highlights the need for temperance when interpreting the 
results of high-throughput assays of gene requirements. For 
example, many genes in Salmonella Pathogenicity Island 2 (SPI- 
2) did not exhibit transposon-insertions, despite clear evidence 
from directed knockouts showing that these genes are non- 
essential for viability or growth. Under laboratory conditions, 
SPI-2 is silenced by the nucleoid-forming protein H-NS, 32,33 
which acts by oligermerizing along silenced regions of DNA 
blocking RNA polymerase access. A previous study has shown 
that transposon insertion "cold spots" can be caused by competi- 
tion between high-density proteins and transposases for DNA. 34 
This suggests that H-NS may be restricting transposase access to 
DNA, though this has not previously been observed in transpo- 
son-insertion sequencing data, and will require additional work 
to confirm. 

Defining Conditional Gene Requirements 

One of the most valuable applications of the transposon-insertion 
sequencing method is the ability to identify genes important in 
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a condition of interest, by comparing differences in the numbers 
of sequencing reads from input (control) mutant pools to out- 
put (test) pools that have been subject to passaging in a certain 
growth condition. Insertion counts are compared from cells in 
the input pool and those after passage, thereby identifying genes 
that either enhance or detract from survival and/or growth in 
the given condition, defined by decreased or increased insertion 
frequency, respectively. A further application of this method 
involves comparing insertions between biologically linked con- 
ditions, such as cellular stresses or different stages of a murine 
infection, to gain insight into complex systems. 22 

So far, transposon-insertion sequencing has been used to 
investigate a number of interesting biological questions: bile tol- 
erance in S. Typhi 6 and S. Typhimuriump bacteriophage infection 
of S. Typhi, 36 antibiotic resistance in Pseudomonas aeruginosa, 10 
cholesterol utilization in M. tuberculosis 27 and a number of stress 
and nutrient conditions in S. pneumoniae. 12 Transposon-insertion 
sequencing of populations passed through murine models have 
been used to assess genes required to establish the gut commensal 
B. thetaiotaomicron in its niche, 7 for Hemophilus influenzae infec- 
tion, 8 as well as S. pneumoniae responses to two in vivo niches, 
the lung and nasopharynx. 22 A further extension of the method 
examined double mutant libraries, that is transposon mutant 
libraries generated in a defined deletion background, to tease 
apart complex networks of regulatory genes. 9 

Two studies in particular illustrate the power of using trans- 
poson-insertion sequencing to identify conditionally required 
genes. In the first, Goodman et al. set out to determine the genes 
necessary for the establishment of the commensal B. thetaiotao- 
micron in a murine model. 7 First, the growth requirements of 
transposon mutant populations in the cecum of germ-free mice 
was assessed, and genes required for growth in monoassociation 
with the host were found to be enriched in functions such as 
energy production and amino acid metabolism. By further com- 
paring monoassociated transposon mutant libraries with those 
grown in the presence of three defined communities of human 
gut-associated bacteria, the authors identified a locus upregulated 
by low levels of vitamin B [2 that is only required in the absence 
of other bacteria capable of synthesizing B p . This showed that 
the gene requirements of any particular bacterium in the gut are 
at least partially dependent on the metabolic capabilities of the 
entire community and emphasizes the importance of testing in 
vivo conditions to complement in vitro study. 

The second study, conducted by van Opijnen and Camilli, 
aimed to map the genetic networks involved in a range of cellular 
stress responses in S. pneumoniae. 22 Seventeen in vitro conditions 
were tested, including: pH, nutrient limitation, temperature, anti- 
biotic, heavy metal and hydrogen peroxide stress. Approximately 
6% of disrupted genes resulted in increased fitness in some con- 
dition, suggesting that some genes are maintained despite being 
detrimental to the organism under particular conditions. These 
would be interesting candidates for further functional and evo- 
lutionary study, as the maintenance of these genes is presumably 
highly dependent on the conditions the bacteria faces, and may 
have implications for our understanding of e.g., gene loss in the 
process of bacterial host adaptation. 37 Two additional in vivo 



experiments were performed in a murine model, where cells were 
recovered from the lung and nasopharynx. Combining this data, 
over 1,800 genotype-phenotype genetic interactions were identi- 
fied. These interactions were mapped and pathways identified. 
Between the two in vivo niches, certain stress response pathways 
were markedly different. For example, temperature stress pro- 
duced a distinct response in the lung, compared with the naso- 
pharyanx, which is perhaps to be expected as temperature varies 
greatly between these two sites. By further examining sub-path- 
ways required in the two different niches and comparing them to 
in vitro requirements, the authors were able to draw conclusions 
regarding the condition S. pneumoniae faces when establishing an 
infection. This comprehensive mapping of genotype-phenotype 
relationships will serve as an important atlas for further studies. 

Monitoring ncRNA Contributions to Fitness 

To date, four studies have used transposon-insertion sequencing 
to examine the contribution of non-coding RNAs (ncRNAs) 
and other non-coding regions to organismal fitness (see Table 1). 
Two of these examined requirements for non-coding regions in 
the relatively under-explored bacterial species Caulobacter cres- 
centus' 6 and M. tuber miosis P Both utilized analytical techniques 
that allowed for the identification of putative required regions in 
the absence of genome annotation. Twenty-seven small RNAs 
(sRNAs) had previously been detected in C. crescentusp six 
were found to be depleted in transposon insertions indicating an 
important role in basic cellular processes. Additionally, the well- 
characterized ncRNAs tmRNA and RNaseP, as well as 29 non- 
redundant tRNAs, were found to be required. An additional 90 
unannotated non-disruptable regions were identified throughout 
the genome, implying an abundance of unexplored functional 
non-coding sequence. 

While the non-coding transcripts of M. tuberculosis have 
been explored more thoroughly than those of C. crescentus, most 
remain functionally uncharacterized, though there are hints that 
some of these may be involved in pathogenicity. 39 Using a mari- 
ner transposon-based assay and a windowed statistical analysis 
that accounted for the distribution of potential TA integration 
sites, 35 intergenic regions were identified as putatively required 
in the M. tuberculosis genome. 23 In common with the C. cres- 
centus study, the RNA component of RNase P, required for the 
maturation of tRNAs and tmRNA, involved in the freeing of 
stalled ribosomes, were identified as required (Fig. 2A) together 
with 10 non-redundant tRNAs and potential promoter regions. 
However, due to the lower overall insertion density and lack of 
TA sites in some GC-rich regions, there were some regions that 
could not be assayed and the resolution was limited to 250 bases. 

A recent study has examined ncRNA requirements in the 
S. enterica serovars Typhi and Typhimurium. 15 Using the tRNAs 
as a model set of ncRNAs, this study showed that the high trans- 
poson insertion density achieved by the TraDIS protocol is capa- 
ble of assaying the requirement for genomic regions as small as 
70-80 bases. S. enterica, together with the closely related E. coli, 
has served as a model organism for the discovery and elucidation 
of ncRNA function, and extensive annotations of non-coding 
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Figure 2. Applications of transposon-insertion sequencing to non-coding RNAs. (A) Plots of genomic regions in Mycobacterium tuberculosis contain- 
ing the required non-coding RNAs RNase P (top) and tmRNA (bottom). Tracks, from top to bottom, 1. Histogram of insertion counts, 2. Comprehensive 
heat-map of requirement of 500-bp windows, 3. Position of annotated genes, 4. Position of TA dinucleotide sites, 5. Position of non-coding RNA. 
Reproduced from reference 23. (B) 1 x 1 competition assays validate attenuating Streptococcus pneumoniae sRNA mutants identified by transposon- 
insertion sequencing. Mice were infected with defined deletions of sRNAs identified as attenuating by Tn-seq and wild-type S. pneumoniae TIGR4 at 
the body site indicated and bacterial densities were compared 24 h post-infection. These plots show the derived competitive index in blood (top) and 
the nasopharnyx (bottom). Each point represents the result of a competition experiment between an sRNA deletion mutant and wild-type TIGR4. A 
competitive index of 1 indicates equivalent numbers of mutants and wild-type were recovered. Modified from reference 46. 



transcripts are available. 40 " 44 As a result, this study was able to 
assay approximately 300 non-coding regions with evidence for 
function or transcription. Among the ncRNAs identified as 
required were RNase P; the RNA component of the signal recog- 
nition particle, involved in targeting proteins to the plasma mem- 
brane; and a number of known autoregulatory ribosomal protein 
leader sequences, 45 as well as providing evidence for a novel leader 
sequence, StyR-8, 43 which appears to be involved in the auto- 
regulation of the rpmB gene. In total, this study identified 15 
confirmed and putative ncRNAs required for robust competitive 
growth on rich media in both serovars, including a number of 
known sRNAs involved in stress response. 

A particularly exciting study has been conducted in S. pneu- 
moniae TIGR4 combining RNA-seq with transposon-insertion 



sequencing. 46 To identify sRNA loci, the authors first sequenced 
size-selected RNA from the wild-type and three two-compo- 
nent system knockouts, identifying 89 putative sRNAs, 56 of 
which were novel. Fifteen of these candidates, selected on the 
basis of high expression and low predicted folding free energy, 
were assayed for their ability to establish invasive disease in a 
murine model. Of these, eight sRNA deletions showed a signifi- 
cant attenuation of disease. To more broadly establish the roles 
of sRNAs in infecting particular organs, transposon insertion 
libraries were administered directly to the nasopharnyx, lungs 
or blood of mice, and bacteria were harvested following disease 
progression. Twenty-six, 28 and 18 sRNAs were found to attenu- 
ate infection in the nasopharnyx, lung and blood, respectively. 
These results were then validated with targeted deletions of 11 
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Table 2. Advantages and limitations of transposon-insertion sequencing 
Advantages 

Library construction is extremely rapid in comparison to targeted 
deletion libraries. 



Limitations 

Requirements for particular nucleotides at transposon-insertion sites or 
insertion biases can limit resolution. 



Gene requirements and fitness effects can be quickly assayed in a wide 
range of conditions. 

The precise location of transposon insertions can be determined due to 
the nucleotide resolution of high-throughput sequencing. 

Wide dynamic range compared with older microarray-based 
technologies. 

Requirements and fitness effects of genomic regions can be deter- 
mined independently of annotation. 



Determination of gene essentiality is dependent on insertion density, and 
is less conclusive than targeted gene deletion in clonal populations. 

Only genomic regions that tolerate insertions under the conditions of 
library creation may be assayed for fitness effects in further conditions. 

The dynamic range for fitness effects is dependent on mutant abundance 
in the initial library and may be limiting for some genes. 

Mutants must be reconstructed for follow-up experiments in the absence 
of specialized protocols and robotics (see e.g., Goodman, 2009 7 ). 



sRNAs (Fig. 2B). In addition to establishing the role of sRNAs 
in S. pneumoniae virulence, this study illustrated the power of 
combining RNA-seq and transposon-insertion sequencing to 
rapidly assign phenotypes to non-coding sequences. 

Limitations. In this review, we have largely focused on the 
potential of transposon-insertion sequencing. However, this 
technology does have a number of important limitations, which 
we collect here and summarize in Table 2. As discussed previ- 
ously, requirements for particular nucleotides at insertion sites, 
such as the TA required by the Mariner transposon, or preference 
for certain sequence composition, such as the AT bias exhibited 
by Tn5, can limit the density of observed insertions in certain 
genomic regions. This may impact any downstream analysis, 
and can potentially bias results, particularly the determina- 
tion of gene requirements. Even if this bias has been accounted 
for, transposon-insertion screens will always over-predict gene 
requirements in comparison to targeted deletion libraries as 
discussed previously. However, this over-prediction can be con- 
trolled either through careful consideration of known insertion 
biases as in many Mariner-based studies, or by high-insertion 
densities, such as those achieved in several Tn5-based stud- 
ies (Table 1). Once the library has been created, only regions 
that have accumulated insertions in the conditions of library 
creation will be able to be assayed for fitness effects in further 
conditions. This means that regions that lead to slow growth 
phenotypes when disrupted in standard laboratory conditions 
may be difficult to assay in other conditions. Additionally, the 
dynamic range of fitness effects detected will depend on the 
complexity of the input library (s). The absence of insertions 
may be a particular problem for assaying small genomic ele- 
ments, such as sRNAs or short ORFs. Finally, the validation 
of hypotheses derived from transposon-insertion sequencing 
will require the construction of targeted deletions, as individ- 
ual mutants cannot be recovered from pools unless specialized 
protocols have been followed during library construction (as in 
Goodman, 2009 7 ). 

The Future of Transposon-insertion Sequencing 

Transposon-insertion sequencing is a robust and powerful tech- 
nique for the rapid connection of genotype to phenotype in a 



wide range of bacterial species. Already, a number of studies have 
demonstrated the effectiveness of this method and the results 
have been far-reaching: enhancing our understanding of basic 
gene functions, establishing requirements for colonization and 
infection, mapping complex metabolic pathways and exploring 
non-coding genomic "dark matter." Due to the range of poten- 
tial applications of transposon-insertion sequencing, along with 
the decreasing cost and growing accessibility of next-generation 
sequencing, we believe that this method will become increasingly 
common in the near future. 

A number of bacterial species have already been subjected to 
transposon-insertion sequencing (Table 1). Microarray-based 
approaches to monitoring transposon mutant libraries have even 
been applied to eukaryotic systems, 47 and similarly transposon- 
insertion sequencing can potentially be applied to any system 
where the creation of large-scale transposon mutant libraries is 
technologically feasible. Recently, the Genomic Encyclopedia of 
Bacteria and Archea (GEBA) 48 has been expanding our knowl- 
edge of bacterial diversity through targeted genomic sequencing 
of underexplored branches of the tree of life. Applying transpo- 
son-insertion sequencing in a comparative manner 15 across the 
bacterial phylogeny will provide an unprecedented view of the 
determinants for survival in diverse environments. While most 
transposon-insertion sequencing studies to date have focused on 
pathogenic bacteria, these techniques could also have applications 
in energy production, bioremediation and synthetic biology. 

The combination of transposon-insertion sequencing with 
other high-throughput and computational methods is already 
proving to be fertile ground for enhancing our understanding 
of bacterial systems. For instance, by using transposon-insertion 
sequencing in a collection of relatively simple conditions com- 
bined with a computational pathway analysis, van Opijnen and 
Camilli were able to provide a holistic understanding of the 
genetic subsystems involved in a complex process such as S. pneu- 
moniae pathogenesis. 22 In the future, methods to assay phenotype 
in a high-throughput manner 4 '' 50 may be combined with trans- 
poson-insertion sequencing to provide exhaustive simple geno- 
type-phenotype associations with which to understand complex 
processes in a systems biology framework. We look forward to 
the adoption of these data sets by the community as an important 
tool for rapid hypothesis generation. 
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